Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Script identification from Indian documents

Identifieur interne : 001152 ( Main/Exploration ); précédent : 001151; suivant : 001153

Script identification from Indian documents

Auteurs : GOPAL DATT JOSHI [Inde] ; Saurabh Garg [Inde] ; Jayanthi Sivaswamy [Inde]

Source :

RBID : Pascal:08-0029049

Descripteurs français

English descriptors

Abstract

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper, we present a scheme to identify different Indian scripts from a document image. This scheme employs hierarchical classification which uses features consistent with human perception. Such features are extracted from the responses of a multi-channel log-Gabor filter bank, designed at an optimal scale and multiple orientations. In the first stage, the classifier groups the scripts into five major classes using global features. At the next stage, a sub-classification is performed based on script-specific features. All features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust to skew generated in the process of scanning and relatively insensitive to change in font size. This proposed system achieves an overall classification accuracy of 97.11% on a large testing data set. These results serve to establish the utility of global approach to classification of scripts.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Script identification from Indian documents</title>
<author>
<name sortKey="Gopal Datt Joshi" sort="Gopal Datt Joshi" uniqKey="Gopal Datt Joshi" last="Gopal Datt Joshi">GOPAL DATT JOSHI</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Garg, Saurabh" sort="Garg, Saurabh" uniqKey="Garg S" first="Saurabh" last="Garg">Saurabh Garg</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sivaswamy, Jayanthi" sort="Sivaswamy, Jayanthi" uniqKey="Sivaswamy J" first="Jayanthi" last="Sivaswamy">Jayanthi Sivaswamy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">08-0029049</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 08-0029049 INIST</idno>
<idno type="RBID">Pascal:08-0029049</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000302</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000482</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000314</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Gopal Datt Joshi:script:identification:from</idno>
<idno type="wicri:Area/Main/Merge">001175</idno>
<idno type="wicri:Area/Main/Curation">001152</idno>
<idno type="wicri:Area/Main/Exploration">001152</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Script identification from Indian documents</title>
<author>
<name sortKey="Gopal Datt Joshi" sort="Gopal Datt Joshi" uniqKey="Gopal Datt Joshi" last="Gopal Datt Joshi">GOPAL DATT JOSHI</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Garg, Saurabh" sort="Garg, Saurabh" uniqKey="Garg S" first="Saurabh" last="Garg">Saurabh Garg</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sivaswamy, Jayanthi" sort="Sivaswamy, Jayanthi" uniqKey="Sivaswamy J" first="Jayanthi" last="Sivaswamy">Jayanthi Sivaswamy</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Visual Information Technology, IIIT Hyderabad</s1>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Centre for Visual Information Technology, IIIT Hyderabad</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Archive</term>
<term>Character recognition</term>
<term>Classification scheme</term>
<term>Data processing</term>
<term>Document analysis</term>
<term>Document processing</term>
<term>Document structure</term>
<term>Gabor filter</term>
<term>Hierarchical classification</term>
<term>Image processing</term>
<term>Image segmentation</term>
<term>Multichannel filter</term>
<term>Multilingualism</term>
<term>Multiscale method</term>
<term>Optical character recognition</term>
<term>Optimal design</term>
<term>Optimization</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Text</term>
<term>Very large databases</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance forme</term>
<term>Analyse documentaire</term>
<term>Structure document</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Texte</term>
<term>Traitement document</term>
<term>Segmentation image</term>
<term>Traitement image</term>
<term>Traitement donnée</term>
<term>Base donnée très grande</term>
<term>Multilinguisme</term>
<term>Archive</term>
<term>Plan classement</term>
<term>Classification hiérarchique</term>
<term>Filtre multicanal</term>
<term>Extraction forme</term>
<term>Filtre Gabor</term>
<term>Optimisation</term>
<term>Conception optimale</term>
<term>Méthode échelle multiple</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Multilinguisme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper, we present a scheme to identify different Indian scripts from a document image. This scheme employs hierarchical classification which uses features consistent with human perception. Such features are extracted from the responses of a multi-channel log-Gabor filter bank, designed at an optimal scale and multiple orientations. In the first stage, the classifier groups the scripts into five major classes using global features. At the next stage, a sub-classification is performed based on script-specific features. All features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust to skew generated in the process of scanning and relatively insensitive to change in font size. This proposed system achieves an overall classification accuracy of 97.11% on a large testing data set. These results serve to establish the utility of global approach to classification of scripts.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Gopal Datt Joshi" sort="Gopal Datt Joshi" uniqKey="Gopal Datt Joshi" last="Gopal Datt Joshi">GOPAL DATT JOSHI</name>
</noRegion>
<name sortKey="Garg, Saurabh" sort="Garg, Saurabh" uniqKey="Garg S" first="Saurabh" last="Garg">Saurabh Garg</name>
<name sortKey="Sivaswamy, Jayanthi" sort="Sivaswamy, Jayanthi" uniqKey="Sivaswamy J" first="Jayanthi" last="Sivaswamy">Jayanthi Sivaswamy</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001152 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001152 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:08-0029049
   |texte=   Script identification from Indian documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024